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Abstract Global declines in biodiversity have become increasingly severe. Traditional moni- 
toring approaches for assessing marine species distributions and abundances are time consum- 
ing, costly, and manpower intensive. Fortunately, rapid progress of sequencing technologies from 
first-generation to high-throughput sequencing have resulted in improvements in experimen- 
tal techniques. These advances have accelerated rates of species discovery and identification, 
enabling community-level biomonitoring — the ‘Biomonitoring 2.0’ framework. Simultaneous 
multispecies identifications in mixed-sample pools are now mainstream with DNA metabarcod- 
ing, upscaling monitoring from the individual specimen to the ecosystem scale. In this review, 
we examine the progress of DNA metabarcoding over the last decade in the characterisation of 
marine macrobiota to microbial communities. By melding molecular techniques and more tra- 
ditional taxonomic tools, this integrative Biomonitoring 2.0 approach is tailored to improve the 
overall effectiveness of biomonitoring. As such, we here assess its accuracy, expertise require- 
ment, general applicability, time, cost-effectiveness, and throughput for biomonitoring. We 
highlight various methodological challenges that must be considered during implementation, 
including completeness of reference databases, representativeness of sequencing read counts for 
quantitative estimates, and supplementation with environmental RNA for discerning live signals 
from legacy DNA. Finally, we conclude with an outlook of the enhanced Biomonitoring 2.0 
framework for mass adoption by ecologists and managers, as well as the prospects of emerging 
rapid detection technologies for ecosystem surveillance. 


Keywords: Barcoding, Bioinformatics; Environmental DNA Metabarcoding; Environmental RNA; 
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Introduction 


Species declines and advances in DNA sequencing 


Many species are predicted to go extinct before discovery and formal taxonomic description, as a 
result of the ongoing deterioration in ecosystem health and worsening biodiversity declines over the 
recent decades (Costello et al. 2013). Biodiversity losses have been at their highest in the last decade 
(IPBES 2019), and the planet is facing a sixth mass extinction, the Anthropocene extinction event 
(Barnosky et al. 2011, Waters et al. 2016, Ceballos et al. 2020). This situation underscores the urgent 
need to monitor environmental responses and assess ecosystem health to take stock of extant bio- 
diversity so as to better formulate mitigation measures aimed at protecting Earth’s natural heritage, 
resources, and continued supply of ecosystem services. Of the 2.2 million marine species estimated 
globally, more than 90% remain to be discovered or are pending formal description (Appeltans et al. 
2012, Mora et al. 2011). This is because the assessment of species diversity with traditional methods 
requires direct organism observation and skilled taxonomic expertise for accurate identification 
and description, which is time consuming, costly (Miller 2007, deWitt & deWitt 2008, Carbayo & 
Marques 2011), and increasingly more difficult as traditional taxonomic skills decline (Hopkins & 
Freckleton 2002, Agnarsson & Kuntner 2007, Drew 2011). Moreover, traditional bioassessments 
sometimes employ highly variable methods for specimen examination at different taxonomic lev- 
els, and this inconsistency can produce results that are often not directly comparable across space 
and time (Friberg et al. 2011). Consequently, poor documentation of marine fauna, especially in 
biodiverse regions (Bouchet 2006), coupled with a large backlog of undescribed species has ren- 
dered most marine species unidentifiable and still unknown to science (Mora et al. 2011, 2013). 
Incomplete knowledge of species diversity hinders reliable biodiversity assessments and thus limits 
the effectiveness of management strategies to prevent further biodiversity loss (Isaac et al. 2004). 

Fortunately, molecular techniques offer an efficient and cost-effective way to increase rates of 
species discovery, thereby facilitating species identification with accurate taxonomic and genetic 
information (Hudson 2008, Wang et al. 2018). A particular example is the use of DNA barcoding, a 
technique that has been repeatedly demonstrated in the past two decades to be remarkably effective 
for species identification and discovery (Hebert et al. 2003, Hajibabaei et al. 2007, Ratnasingham 
& Hebert 2007, Goldstein & DeSalle 2011, Wang et al. 2018, Ip et al. 2019). Conceptually, DNA 
barcoding targets a standard gene region (e.g., cytochrome c oxidase subunit I, or COI, for most 
metazoans, Hebert et al. 2003), generating a short DNA sequence (otherwise known as ‘barcode’) 
that is matched to curated reference sequence databases containing previously barcoded sequences 
for species identification (Ekrem et al. 2007). 

Over the last half a century, rapid advances in molecular techniques and technologies have 
followed the discovery of DNA structure (Watson & Crick 1953, Heather & Chain 2016). The 
emergence of first-generation sequencing technologies (Holley et al. 1965) led to the development 
of the chain termination method, or Sanger sequencing (Sanger et al. 1977), followed by next- 
generation (short-read, high-throughput) sequencing at the beginning of the twenty-first century. 
The most recent third-generation (long-read, high-throughput) sequencing uses single molecule 
real-time (SMRT) and Oxford Nanopore Technologies (ONT) (van Dijk et al. 2014). Together, 
next- and third-generation sequencing, also known as high-throughput sequencing (HTS), has 
recently replaced Sanger sequencing methods in many applications due to its cost-effectiveness 
and efficiency (Castro et al. 2020). These technological developments have expanded DNA’s util- 
ity, particularly for species discovery and identification, and this is evidenced in an exponential 
increase in the number of DNA barcoding studies published in the last two decades (~25,000%, 
Grant et al. 2021) (Figure 1). 

Recognising that advanced molecular tools with sequencing technologies can revolutionise spe- 
cies monitoring in the field of ecology, Baird & Hajibabaei (2012) introduced ‘Biomonitoring 2.0’ 
in 2012 as a novel approach, employing next-generation sequencing to gather massive amounts of 
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Figure 1 Line graph (right axis) showing the cumulative number of articles published in the literature com- 
piled from Web of Science and Scopus (n=5079). Stacked bar chart (left axis) showing the relative percentage 
of exemplar first-, second-, and third-generation sequencing strategies used in each year. Publications from 
years 2012 to 2022 were searched using the following keywords: marine, DNA barcoding, metabarcoding, 
Sanger, Illumina, iSeq, MiSeq, HiSeq, NovaSeq, Nanopore, and PacBio. 


information-rich biodiversity data for studying complex environmental and ecological relationships. 
Following a decade of developments under the Biomonitoring 2.0 framework, it is timely to review 
the growing literature to highlight the current status, utility, and methodological challenges influ- 
encing the trajectory of this new era of environmental genomics. Despite the numerous advantages 
of utilising molecular tools for biomonitoring, there is a lack of standardised guidelines ensuring 
accuracy, reproducibility, and scope of use, partly due to the rapid expansion of the field. Validated 
genomics (DNA barcoding and metabarcoding) and accompanying bioinformatics protocols have 
yet to be established as most techniques are undergoing optimisation (Figures | and 2). Additionally, 
one of the most desired aspects of environmental genomics — quantification of species abundance 
and biomass with sequence read counts from environmental DNA (eDNA) and bulk tissue samples 
— remains intensely debated in terms of its precision (Kelly 2016). Since most molecular protocols 
utilise DNA enrichment methods, they can introduce biases downstream during molecular process- 
ing, such as during gene amplification and library preparation (Figure 2). These numerous sources 
of bias remain unresolved and stifle DNA sequencing’s potential in becoming a staple biomonitor- 
ing tool to support management frameworks (Evans et al. 2016). 

In this review, we first provide a general outline of typical DNA barcoding and metabarcod- 
ing workflows used in marine studies (Figure 2). Although there are numerous ways to sample a 
wide range of environmental substrates for detecting, identifying, and characterising communi- 
ties of marine species, we focus on a selection of widely used methods for collecting specimens 
of various body sizes (microbial, meiofauna, macrofauna) and trace DNA signals (eDNA) across 
the water column of the coral reef environment (surface seawater, pelagic, and benthic environ- 
ment). Specifically, these commonly used sampling methods collect eDNA and organisms through 
seawater sampling, sediment grabs or coring, plankton tows, and direct capture with traps, nets 
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Sampling Sequencing Analyses, results and 
tools pes processing methods platforms management strategising 


Figure 2 Overview of workflows that use common sampling tools to collect different sample types from the 
marine environment for molecular analyses, as well as the downstream wet-lab, molecular, analytical, and bio- 
informatic methods employed for species detection, identification, or community assessment of marine bio- 
diversity. (Diagram was created with sankeymatic (https://sankeymatic.com/build/) and icons were adapted 
from Biorender (https://biorender.com/).) 


and standardised sampling devices [e.g., Autonomous Reef Monitoring Structures (ARMS)] 
for molecular analyses (Figure 2). We then discuss the hurdles faced during implementation of 
Biomonitoring 2.0 and highlight strategies to improve accuracy and reproducibility to generate 
meaningful ecological results. Finally, we examine prospects for implementation of these suite of 
tools in an enhanced Biomonitoring 2.0 framework. 


Marine biomonitoring studies and global 
distribution of barcoded species 


All articles published from 2012 to 2022 with ‘Marine’ and ‘DNA barcod*’ or ‘metabarcod*’ in their 
title, abstract, and keywords were downloaded from Scopus and Web of Science (WoS) on 10 May 
2022, and the overlapping entries between citation databases were removed (Supplementary Material 1). 
Subsequently, the type of sequencing technology used for each study was searched across all articles’ 
title, abstract, and keywords, with the aim of compiling the relative proportions of sequencing tech- 
nology within each of the 10 years. Next, we only selected the most popular first-, second-, and third- 
generation sequencing technologies by confining the search with keywords ‘Sanger’; ‘Illumina’ or 
“HiSeq’, ‘iSeq’, ‘MiSeq’, ‘NovaSeq’; and ‘Oxford Nanopore’, ‘Pac* Bio*’, and ‘long read’. These were 
graphically represented by year with ggplot2 in RStudio v.1.4.1106 (R Core Team 2021). 

To track the ongoing progress of marine barcoding efforts worldwide, we first obtained a com- 
plete taxonomic list of 546,847 global marine species from the World Register of Marine Species 
(WoRMS Editorial Board 2022). This list was filtered to retain 429,635 Animalia records that had 
species-level epithets. The filtered marine species list was subsequently used to search against 
15 mitochondrial gene datasets downloaded from MIDORI2 web server (GenBank249 version, 
Longest database, downloaded 07 May 2022) (Leray et al. 2022) to compute the taxon coverage 
of mitochondrial barcodes for marine species available on GenBank. We then traced the meta- 
data of the relevant accession numbers for geographic origin of the sequence, which was based on 
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the locality information provided by the authors who submitted sequences to GenBank. With this 
information, the world distribution of number of barcoded species per country and composition of 
barcode genes per phylum were visualised using ggplot2 in RStudio v.1.4.1106 (R Core Team 2021). 


Employing DNA sequencing for species 
and community-level monitoring 


Comparing error rates and scalabilities of 
common DNA sequencing platforms 


Sanger sequencing produces data with the lowest error rates as compared to second- and third- 
generation sequencers (Table 1). It has been the gold standard for DNA barcoding (Shendure and 
Ji. 2008) and was the most used platform from 2012 to 2014 (Figure 1). Since 2014, HTS tech- 
nologies rapidly diversified with the emergence of a variety of Illumina sequencing platforms 
(MiSeq, HiSeq, iSeq, NextSeq, NovaSeq) and long-read sequencers [e.g., PacBio’s Sequel and ONT 
MinION]. The HTS data outputs are mainly classified into 1) short (<400 bp) and accurate reads or 
2) long reads (<4 Mbp fragment size) with comparatively higher error rates (Table 1). The latter has 
been demonstrated to recover longer DNA barcodes with improved inter- and intraspecific resolu- 
tion (Krehenwinkel et al. 2019). Most importantly, HTS platforms have the scalability to accommo- 
date a wide range of project sizes and target taxa (Garlapati et al. 2019). As sequencing instrument 
sensitivities and capacities gain superiority, equally rapid advancements in computational perfor- 
mance and standardisation of bioinformatic analyses are needed to keep pace with the sheer amount 
of sequencing data produced (Langmead & Nellore 2018). These computational advances in turn 
prevent bottlenecks in ecological analyses and enable the timely implementation of mitigation strat- 
egies in response to anthropogenic impacts on the environment (Langmead & Nellore 2018, Mathon 
et al. 2021, Macé et al. 2022). 


Expanding aperture of observation with Biomonitoring 2.0 


Routine biodiversity assessment is necessary for managing and protecting marine ecosystems 
(Hampton et al. 2013, Aylagas et al. 2018). For example, long-term temporal surveillance of abun- 
dance and distribution of a wide range of taxa (micro-to-macro flora and fauna) is key to revealing 
the anthropogenic impacts on the environment, for instance, through tracking changes in commu- 
nity structure in response to rising sea temperatures, overfishing, habitat destruction, introduction 
of alien species, and pollutants (Aylagas et al. 2018, Hering et al. 2018, Yip et al. 2021). However, 
traditional approaches typically involve direct observation (e.g., benthic surveys, baited remote 
underwater video surveillance) and physical organism sampling (Figure 2; e.g., nets, traps, elec- 
trofishing) that are oftentimes limited in scale and scope (survey sites and target taxa). Moreover, 
direct observational data can make comparisons difficult as the accuracy in organismal identifica- 
tion varies between taxonomic practitioners and life-history stages, and may be resolved at dif- 
ferent taxonomic levels. As such, species units could be clustered at higher taxonomic levels and 
the signals from species-level responses to environmental changes may be concealed, rendering 
biomonitoring efforts to be less effective. 

Advancing genomic tools to enhance environmental monitoring can help improve understand- 
ing and management of marine ecosystems currently experiencing species decline and biodiversity 
loss (Dirzo et al. 2014). Deep sequencing with HTS, coupled with increased utility of bioinfor- 
matics, provides an opportunity to leverage the power of phylogenetic and population genomic 
approaches. Most importantly, these advancements facilitate the progression from single and mul- 
tispecies traditional assessments to whole ecosystem surveillance with DNA metabarcoding (Leray 
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& Knowlton 2015, Miya et al. 2015, Ip et al. 2021b). Leveraging the concepts of DNA barcoding 
with HTS, DNA metabarcoding enables high-throughput, multispecies identifications in bulk envi- 
ronmental samples, with reduced reliance on taxonomic expertise for presorting (Beentjes et al. 
2019, Djurhuus et al. 2018, Mauffrey et al. 2021). Limitations in scope faced by traditional sorting 
based on Sanger sequencing workflows can also be addressed, since organisms from all size ranges 
and even visually cryptic taxa can be sampled with DNA metabarcoding; this results in data that 
tend to be complementary and comparable with conventional surveys (Lobo et al. 2017, Cahill et al. 
2018, Di Muri et al. 2020). Even as HTS has been shaping the field of molecular ecology (Baird and 
Hajibabaei 2012, Aylagas et al. 2016, Pawlowski et al. 2018), Sanger sequencing and morphological 
methods remain relevant, as taxonomic assignments of metabarcoding sequences are largely depen- 
dent on reference databases that have thus far been built primarily from Sanger-sequenced barcodes 
(Steyaert et al. 2020). 

Nevertheless, ecosystem management approaches benefit from monitoring efforts that col- 
lect relevant ecological data consistently over extended periods of time (Compson et al. 2020). 
Advancements in HTS technologies are important here, as they have fuelled development of novel 
environmental monitoring frameworks. In particular, Biomonitoring 2.0 (Baird & Hajibabaei 
2012) establishes a universal comparison scheme with DNA-based species identification for tar- 
geting a wide range of biodiversity across different ecosystems with reduced reliance on taxo- 
nomic expertise (Zhang et al. 2018, Carvalho et al. 2019, Pearman et al. 2020, Ip et al. 2022a). 
Since its introduction in 2012, there has been a sharp increase in marine studies published using 
either DNA barcoding or metabarcoding techniques for biodiversity assessment and biomonitor- 
ing (Figure 1). At least 5,079 articles were published in the last decade, of which 1,734 were from 
the last 1.5 years (Figure 1). This increase can be attributed to the increased user accessibility and 
utility of HTS, following the inverse trends of technological advances and lowering costs over 
time (Grant et al. 2021). Moreover, the rapid proliferation of commercial sequencing companies 
in the last 5 years has also contributed to the diversification of sequencing applications (Slatko et 
al. 2018, Singer et al. 2019). 


(Meta)barcoding for community-level biomonitoring in marine ecosystems 


Current studies focus on a few prominent taxonomic groups, such as indicator, keystone, founda- 
tion, megafauna, and abundant species that are easily observable or already well-studied for overall 
community assessments (Hermosillo-Nufiez et al. 2018, Mustika et al. 2021, Seymour et al. 2020, 
Mendez et al. 2021). Recognising that most studies conducted so far show varying levels of taxo- 
nomic biases, the next step forward would be to ascertain which taxa are ecologically significant 
before expanding the monitoring scope to track these informative taxa that are often overlooked 
by observers (Carvalho et al. 2019, Seymour et al. 2020, Ip et al. 2022a). With the recent progress 
in ability of DNA metabarcoding tools for simultaneous detection of multiple species from a range 
of body sizes (Porter & Hajibabaei 2018c), there is great promise in quantifying compositions 
of microbial, meiofauna, and macrofauna communities and for comparing species relative abun- 
dances within or between environmental bulk samples (Figure 2) (Pearman et al. 2019, Gaither 
et al. 2022, Klunder et al. 2022, Pawlowski et al. 2022). The enhanced throughput of organism 
detection has triggered wide-ranging applications in a broad range of habitats that include many 
not amenable to traditional survey techniques, demonstrating the broad utility of DNA metabar- 
coding across sample types (e.g., Antarctic sediment, Fonseca et al. 2022; sedimented seawater, 
Ip et al. 2021b; biofilms, Rivera et al. 2022). Most field-collected samples types are suitable for 
metabarcoding analyses after laboratory processing, which include bulk tissue samples that have 
to be homogenised by blending or pestle grinding of tissue, and environmental samples with the 
concentration and recovery of trace DNA from water and sediment samples (Figure 2). With bulk 
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sample processing, DNA metabarcoding can circumvent the time-consuming specimen sorting 
component of traditional workflows and also detect rare or visually cryptic taxa that are typically 
missed by observers (Carvalho et al. 2019, Pearman et al. 2019, Ip et al. 2022a). As for environ- 
mental samples, genetic materials are being recovered from shed cells, excretions, and mucus sus- 
pended in water or sediment — also known as environmental DNA (eDNA) (Goldberg et al. 2016, 
Jo et al. 2022a). As such, non-invasive biomonitoring approaches utilising eDNA tools are becom- 
ing one of the most popular metabarcoding applications today as it reduces field-experimental 
challenges by eliminating the need for direct observation or organism capture. Hence, this allows 
for the indirect detection of rare, endangered, or elusive organisms that are typically challenging to 
survey or capture (Boussarie et al. 2018, Ip et al. 2021a, Mathon et al. 2022, Richards et al. 2022, 
Zainal Abidin et al. 2022). Notably, the same DNA metabarcoding techniques have also been 
applied in environmental RNA metabarcoding, which can further elucidate the viability of signals 
from active communities and quantify organismal responses to environmental changes (Marshall 
et al. 2021, Ankley et al. 2022, Zaiko et al. 2022). 


Enhancing Biomonitoring 2.0 with new molecular tools 


Integrating molecular techniques and taxonomic tools 


Typical traditional monitoring methods such as direct sampling, organism capture, camera trapping, 
and visual census (Figure 2, Wong et al. 2018, Lim et al. 2020, Taira et al. 2020) are susceptible to 
various sampling limitations, including destructive sampling, site inaccessibility, observer biases, 
and overlooking neglected taxa. They also tend to focus on conspicuous groups (Pearman et al. 
2018, Ip et al. 2022a), leading to high levels of false negatives, and overall, inaccurate community 
assessments and biodiversity estimates. 

An integrative approach with molecular techniques complementing traditional methods 
enhances the efficiency of monitoring marine biodiversity (Chang et al. 2022a, Czachur et 
al. 2022, Wang et al. 2018, Ip et al. 2019, Richards et al. 2022). Instead of starting with con- 
ventional morphological identification, a reverse workflow approach would mean molecular 
methods are first used to identify specimens by matching them to databases of previously 
identified and barcoded species. This can accelerate the process of species discovery and iden- 
tification, since it eliminates the prerequisite of involving trained taxonomists in species moni- 
toring efforts (Wang et al. 2018). Even for undescribed species with no reference barcodes, the 
reverse workflow can help rapidly sort specimens into molecular operational taxonomic units 
(MOTUs) based on sequence similarity, and subsequently, direct follow-up morphology-based 
identification work. 

Notwithstanding these advantages, appropriate experimental designs are crucial for effective 
species detection with molecular techniques. It must be emphasised that pilot experiments are 
needed to assess feasibility of study designs and to optimise methods (Furlan et al. 2016, Goldberg 
et al. 2016) since there is no single template procedure suitable for processing every sample type 
while targeting all species (Barnes & Turner 2016, de Souza et al. 2016). Furthermore, method- 
ological biases can potentially be introduced at every experimental stage, from sample collection, 
gene amplification, sequencing to data analysis, which could result in erroneous ecological infer- 
ences (Gold et al. 2022, van der Loos & Nijland 2021, Zinger et al. 2019). Prevailing consider- 
ations in study designs include minimalisation of potential biases across all steps of the workflow, 
optimisation of the bioinformatics pipeline, selection of appropriate gene region(s) of interest, 
taxon coverage of barcoded genes in reference databases, as well as managing contamination 
and misidentification (Gold et al. 2022, Martins et al. 2021, Richards et al. 2022, van der Loos & 
Njiland 2021, Zaiko et al. 2022). 
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Sample collection, fixation, and processing 


Although there are many different techniques for sample collection, we focus on five commonly 
used sampling tools as depicted in Figure 2. The type of specimens collected with these sampling 
tools range from water, sediment, bulk to specimen vouchers, each of which are typically processed 
distinctly (Figure 2). Downstream molecular methods are relatively consistent following the sample 
processing stage, but the type of final data output for ecological analyses is largely contingent on the 
specific research question (Figure 2). This may either involve sequencing on different platforms or 
employment of rapid detection technologies without sequencing involvement (Figure 2). 

DNA barcoding studies focus primarily on biodiversity assessments using DNA-based identifi- 
cation of collected specimens. They typically engage in tissue subsampling for molecular analyses 
and preserve whole specimen vouchers for morphological identification. Hard-bodied organisms, 
such as invertebrates with shells (molluscs) or exoskeletons (arthropods), are preserved as vouchers 
in 70% ethanol; while soft-bodied organisms like polychaetes, fish, and flatworms are often fixed in 
formalin (Vivien et al. 2018). It is noteworthy that formalin-fixed tissues are suboptimal for DNA 
sequencing (Hahn et al. 2022, Raxworthy et al. 2021). Nevertheless, formalin fixation ensures mini- 
mal loss of key features for morphology-based taxonomic work, while subsamples from expendable 
body parts are used to generate high-quality DNA barcodes tagged to properly identified vouchers. 

Furthermore, DNA metabarcoding is employed by studies with the aim of simultaneous identifi- 
cation of multiple taxa or characterisation of community composition from mixed-sample pools and 
bulk samples. Environmental bulk samples are collected by hand (e.g., surface water sampling with 
sterile plastic bottles), with a horizontal (e.g., Van Dorn) or vertical (e.g., Niskin) water sampler for 
seawater sampling, and a grab or corer is used for sediment sampling (Figure 2). Since environmen- 
tal DNA signals in seawater can be distinct between sampling depths (DiBattista et al. 2019, Jeunen 
et al. 2020, Monuki et al. 2021, but see Ip et al. 2021b), working understanding of the target species’ 
biology is key for determining the appropriate seawater sampling depth. For instance, detecting 
pelagic species requires sampling of the mid-water column while collecting seawater near the sea- 
bed can improve the detection of benthic species (Antich et al. 2021a). Vacuum or peristaltic pumps 
are typically used to concentrate genetic material from the seawater on to a porous membrane (via 
ultrafiltration) for downstream molecular processing. For sample preservation, filter membranes can 
be dry-frozen, stored at -20°C in ethanol or cell-lysis buffers (e.g., Longmire’s solution, Sarkosyl 
buffer), desiccated using silica beads or with self-preserving filter membrane housing units until 
further work is conducted (Thomas et al. 2019, Williams et al. 2016, Mauvisseau et al. 2021). 

As for sediment subsampling, it is done by collecting sediment with sterile spatulas or syringes 
from the surface and the centre of the grab or core sample, so as to ensure sample integrity while 
avoiding vertical admixture (Lins et al. 2021, Pawlowski et al. 2022). The subsamples are stored 
in 100% molecular grade ethanol or dimethyl sulfoxide (DMSO) solution and frozen till further 
processing (Ransome et al. 2017). Notably, preprocessing of samples may be required before sea- 
water filtration or sediment sampling. Depending on the nature of the sample, prefiltering of turbid 
seawater samples with larger pore-sized membranes prevents clogging of filters and can help reduce 
amplification inhibition in downstream molecular processes, but also risks decreasing DNA yield 
(Stoeckle et al. 2017, Hunter et al. 2019). Similarly, sieving sediment samples that may contain the 
larger body-sized macrofauna helps to isolate these larger animals present within the sediment 
(Aylagas et al. 2018, Gielings et al. 2021), which could otherwise contribute disproportionately more 
DNA to the mixed-sample pools and mask signals from other rare taxa during metabarcoding. 

Lastly, studies that utilise standardised sampling devices, e.g., Autonomous Reef Monitoring 
Structures (ARMS, Leray & Knowlton 2015), are focused on the sampling of the reef matrix, which 
can generate many different sample types, such as meio- and macrofauna specimen vouchers, 
bulk samples of encrusted material, and sediment and seawater environmental samples. As such, 
ARMS require a combination of wet-lab processing techniques to collect and preserve the samples 
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(Leray & Knowlton 2015), which include preprocessing with serial sieving (<2mm mesh size) of 
meio- and macrofauna for tissue subsampling and fixation (Ransome et al. 2017), collection of sedi- 
ment from encrusted plates for homogenisation (Aylagas et al. 2016), and filtering of containment 
seawater for the concentration of genetic material (Figure 2, Aylagas et al. 2016, Ip et al. 2021b, 
Nichols et al. 2022). 


Choice of gene markers and taxonomic resolution 


The same primers used for DNA barcoding of single species specimens can be applied for DNA 
metabarcoding assays that target the same gene loci for species delimitation. Critically, slight modi- 
fications must be made to the 5’ terminus of the primer sequence to render it suitable for DNA 
metabarcoding. The addition of a short (8—9 bp) and unique oligonucleotide sequence distinguishes 
between sets of primers used on different samples in the same assay, thereby permitting sample 
multiplexing during sequencing (Meier et al. 2016). These uniquely tagged primers are assigned 
to individual or mixed samples and must not overlap with one another so that downstream demul- 
tiplexing or bioinformatically sorting and assigning of sequences back to their respective sample 
‘bins’ can occur. 

Nevertheless, the selection of gene regions for (meta)barcoding is closely associated with the 
availability of published molecular markers and the robustness of reference barcode databases 
for providing sufficient resolution in species identification. It is imperative to target the most suit- 
able or well-represented gene locus for barcoding or metabarcoding assays since the downstream 
analyses hinge on the species delimitation resolution that the gene marker provides. To maxi- 
mise the barcoding gap that separates intra- and interspecific variation (Meyer & Paulay 2005), 
in silico experiments can identify gene regions with adequate variation for unambiguous species 
delimitation, while ideally being flanked by conserved regions to allow primer binding for poly- 
merase chain reaction (PCR). Therefore, mitochondrial gene fragments like COI (26.6% of marine 
metazoan GenBank sequences), 12S rRNA (11.8%), and 16S rRNA (19.3%) are common target 
loci since these are relatively well represented in reference databases (Figure 3) and sufficiently 
variable for species delimitation in many taxa (Thomsen et al. 2012, Kelly et al. 2014, Valentini 
et al. 2016, Sigsgaard et al. 2020, but see Anthozoa (Huang et al. 2008, McFadden et al. 2011), 
Annelida (Sun et al. 2012), and Platyhelminthes (Vanhove et al. 2013)). To ensure meaningful 
taxonomic assignments of sequence data, emphasis must be placed on having comprehensively 
curated and updated sequence databases, which can also provide feedback on the specificity of the 
assay design. Ideally, a local database specific to the study area should be established to improve 
the precision of sequence matches (Bucklin et al. 2021, Dugal et al. 2022, Gold et al. 2021, Ip et 
al. 2019, Keck & Altermatt 2022). 

Even when a gene marker offers species delimitation resolution, incomplete reference sequence 
databases can result in the barcodes not matching with species-level epithets, while primer mis- 
matches and poor binding affinities can also lead to false negative detections. Intuitively, sequences 
without species names may seem meaningless, for instance, in conservation or invasive species 
monitoring programmes (Locatelli et al. 2020). However, for studies focused on overall diversity 
estimation and community-level responses to environmental changes, they remain relevant as the 
putative species units from taxonomy-free approach becomes the common denominator for com- 
parisons of poorly studied, rare, and morphologically cryptic organisms across various habitats 
(He et al. 2019, Rodrigues et al. 2021). Furthermore, poor primer binding affinities due to mis- 
matches of priming sequences between different species could lead to false negative detections, 
especially when the same universal primers designed for single-specimen DNA barcoding are used 
for bulk or environmental metabarcoding (Deagle et al. 2014, Ip et al. 2021b). Primer mismatches 
also inevitably occur due to varied binding affinities of primers to different species’ template DNA 
in mixed-sample pools (Pifiol et al. 2015, Piper et al. 2019, Ip et al. 2021b), potentially resulting in 
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Figure 3 (A) Global distribution of the 37,223 marine animal species with mitochondrial DNA barcodes, 
based on author-submitted locality metadata from GenBank Release 249, consolidated by MIDORI2. (B) 
Stacked bar chart showing numbers of mitochondrial barcoded marine and non-marine species for each gene. 
(C) Stacked bar chart showing the relative frequencies of the 13 mitochondrial protein-coding and 2 ribosomal 
RNA barcodes available for all marine taxa. 


amplification biases. A multimarker approach would thus help circumvent these issues, as it could 
recover species diversity more inclusively or precisely, and reduce false negative detections from 
primer binding incompatibilities or barcode underrepresentation in reference databases. Notably, 
the multilocus approach is also contingent on barcode representation on global databases. For 
instance, there exist 37,223 marine animal species with mitochondrial DNA barcodes, according to 
MIDORI2 (GenBank Release 249) (Figure 3). Among the 13 mitochondrial protein-coding and two 
ribosomal RNA loci from across 31 marine animal phyla, the top three most well-represented loci 
are COI (13.3-69.7%), 16S (11.2-33.3%), and 12S (11.3-11.9%). Since the representation is much 
poorer for the remaining mitochondrial genes (0.05—12.6%), studies designing multilocus experi- 
ments should target at least COI, 16S, or 12S (Figure 3). 


Target amplicon size in relation to sequencing platform 


Different sequencing platforms return different read lengths (Table 1), which in turn can impact 
taxonomic resolution (Krehenwinkel et al. 2019, Latz et al. 2022, Martijn et al. 2019, Yeo et al. 
2020). For example, with Illumina platforms, the maximum possible barcode length that can be 
generated with minimal error rates is up to 500 bp, which means that primers must be designed 
to target gene regions of less than 400 bp (Table 1). Since the performance of most HTS plat- 
forms is limited to sequencing shorter fragments, universal DNA barcoding markers like the 658 
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Table 1 Comparison of performance, yield, error rates, and costs between Sanger and five 
representatives of high-throughput sequencing platforms (Chang et al. 2020a, Dohm et al. 2020, 
Quail et al. 2012, Shendure & Ji 2008, Stoler & Nekrutenko 2021) 


Short-read platform Long-read platform 
Ilumina 
Illumina HiSeq Illumina Novaseq ONT PacBio 
Sanger Miseq V2/V3 4000/2500 6000 S2 MinION Sequel II 
Sequencing ~$18 ~$30 ~$7/~$20 ~$4 ~$6 ~$10 
cost per 
sample (USD) 
Manpower for Low Medium Medium Medium Low High 
preparation 
Maximum read 1-2-kbp 151 bp/251 bp 151 bp/251 bp 151 bp ~2 Mbp ~100 Kbp 
length 
Throughput Single 8.5 Gbp/15 Gbp 1500 Gbp/300 1250 Gbp 20 Gbp 160 Gbp 
consensus Gbp 
barcode per 
sample 
Error rate ~0.001% 0.473% 0.112% 0.109% R9.4.1:~6% 13% 
R10.3: ~4% 
Run time ~4hours 56 hours 84 hours 56 hours 0.5-72hours 15hours 


bp region introduced by Folmer et al. (1994) are incompatible for paired-end sequencing. Instead, 
only mini-barcoding markers like the 313 bp fragment by Leray et al. (2013) are feasible with HTS. 
Contrastingly, Sanger-based DNA barcoding can target gene regions of up to 1-2 kbp. While shorter 
fragments can be amplified and sequenced more easily (Cruaud et al. 2017), the longer amplicons 
typically have higher discriminatory power to resolve closely related species (Krehenwinkel et al. 
2019, Latz et al. 2022, Martijn et al. 2019). 

The third-generation ONT’s MinION sequencer’s long-read capabilities can bypass some of the 
utility shortcomings of both Sanger and Illumina technologies for DNA barcoding and metabar- 
coding (Pomerantz et al. 2018, Krehenwinkel et al. 2019). However, the MinION’s single flow cell 
capacity has a relatively low throughput of ~50 Gbp. Only ONT’s GridION or PromethION plat- 
forms can comfortably replicate IIlumina’s sequencing yield (Table 1), which have higher theoretical 
maximum outputs of 250 Gbp — 14 Tbp from being able to accommodate multiple flow cells (S—48) 
simultaneously for sequencing (Pervez et al. 2022). Indeed, such applications will be less restricted 
by the amplicon length output with longer-read technology, which shifts the focus to planning con- 
siderations concerning sequencing depth and the nature and state of samples. On the one hand, 
degraded samples are expected to have more fragmented DNA and targeting shorter fragments (still 
>150 bp) likely increase detection successes (Wainwright et al. 2018, Choo et al. 2021). On the other 
hand, fresh tissue samples can be targeted for longer barcodes (or full-length genes even, Giinther 
et al. 2022) to attain a more unambiguous delimitation of species (Quek et al. 2019, 2021, Chang et 
al. 2022a). As such, a suite of sequencing technologies can be assembled to suit each study’s needs 
through the review of sample DNA quality (fresh or degraded) or funding available. 


Decontamination and replication 


Contamination is a major challenge for molecular approaches as it can potentially skew results and 
generate erroneous conclusions (Goldberg et al. 2016, Hansen et al. 2020). These are not limited 
to laboratory settings and can also arise during fieldwork and equipment assembly. Clean, aseptic 
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practices must be implemented at all stages of wet-lab work to reduce contamination risks (Schweiss 
et al. 2020). Disinfection protocols range from UV irradiation, autoclaving, rinsing with distilled 
water, soap water, enzymatic treatment with DNases, and wiping down with bleach to inactivate any 
residual biological materials. Overall, 10% household bleach and the use of single-use, sterilised 
and DNase-treated consumables are strongly encouraged for ensuring decontamination (Goldberg 
et al. 2016, Ip et al. 2021a). Besides rigorous disinfection protocols, it is critical to include negative 
controls at every stage of the experiment to facilitate identification of contamination sources and 
prompt rectification procedures (Goldberg et al. 2016, Williams et al. 2019), such as taking addi- 
tional steps in lab cleaning or bioinformatic filtering (Barba et al. 2014). Most importantly, these 
negative controls must be processed the same way as samples (Goldberg et al. 2016). 

Biological replicates are repeated collections of samples from the same sampling site. They are 
critical for estimating diversity and determining the frequency of species occurrences, especially 
for rare taxa (Bessey et al. 2020, West et al. 2020). Technical replicates are subsamples gener- 
ated from within the same biological sample (Fonseca 2018). Increasing the number of technical 
replicates can help improve detection consistency and confidence as well as enhance likelihood of 
detecting rare taxa (Ficetola et al. 2015). On one hand, a cumulative approach of combining signals 
across replicates reduces false negative detections of rare taxa; while on the other hand, a minimum 
threshold approach increases statistical confidence of species detection, e.g., deemed a true signal if 
detected in >50% of the replicates (Van den Bulcke et al. 2021). Nevertheless, considerations must 
be made regarding the rise in costs and labour intensity with larger numbers of replicates. As such, it 
is recommended that future studies conducting DNA metabarcoding minimally use three technical 
replicates, and species present in at least two replicates can be more confidently interpreted as a true 
biological signal (van der Loos and Nijland 2021). 


Bioinformatics 


Today, a myriad of open-source, command-line programs (Table 2), such as OBITools (Boyer 
et al. 2016) and DADA2 (Callahan et al. 2016), can capably handle the analyses of large biodiver- 
sity datasets containing billions of sequence reads in a short time. The bioinformatic processing 
of sequences is the last step of any barcoding or metabarcoding workflow (Figure 2) in which the 
different ways that the sequencing data are processed and analysed with varied filtering thresholds 
can influence the final detection outcomes of target species (Mathon et al. 2021). It is noteworthy 
that strict quality filtering thresholds are intuitively useful for removing erroneous sequences for 
downstream analyses, but caution must be exercised regarding the over-elimination of reads, which 
can lead to false negatives of rare taxa. 

Most HTS data are generated via paired-end sequencing, and they have forward and reverse 
sequence reads. The first bioinformatics step is to begin with the merging of forward and reverse 
reads using a minimum Phred quality score and base overlap threshold (Ewing et al. 1998), 


Table 2 List of open-source bioinformatic program examples for DNA (meta)barcoding analyses 


Name Web resource Type Reference 
OBITools https://pythonhosted.org/OBITools/welcome. html Command line Boyer et al. (2016) 
DADA2 https://benjjneb. github.io/dada2/index.html Command line Callahan et al. (2016) 
eDNAFlow https://github.com/mahsa-mousavi/eDNAFlow Command line = Mousavi-Derazmahalleh 
et al. (2021) 
FROGS http://frogs.toulouse.inra.fr/ GUI Escudié et al. (2018) 
QUME?2studio https://docs.qiime2.org/2019.4/interfaces/q2studio/ GUI Bolyen et al. (2019) 
SLIM https://trterd.github.io/SLIM/ GUI Dufresne et al. (2019) 
TaxonTableTools _https://github.com/TillMacher/TaxonTableTools GUI Macher et al. (2021) 
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which are determined by the user examining the sequencing run profile on the FASTQC program 
(Andrews 2019). Since barcoding and metabarcoding libraries are multiplexed with large num- 
bers of samples for cost-effectiveness on HTS platforms, the next step involves sorting or assign- 
ing sequences back to their respective samples by using the unique oligonucleotide tags (8—9 bp, 
Bohmann et al. 2022, Ip et al. 2021a,b, see also Section on Choice of gene markers and taxonomic 
resolution) at each read’s 5’ terminal ends for demultiplexing. The adapters, unique multiplexing 
tags, and primer sequences are also removed during the demultiplexing step. Subsequently, quality 
filtering removes sequencing errors, retains sequences with the correct fragment size and a mini- 
mum number of read counts for statistical confidence (Mathon et al. 2021). Next, clustering with 
arbitrary thresholds [e.g., maximum 3% difference in bases for COI sequences (Hebert et al. 2003); 
2% difference for 12S V5 (Riaz et al. 2011)] groups highly similar sequences together into puta- 
tive species units or MOTUs. This application of ‘universal’ clustering thresholds assumes a dis- 
tinct gap between inter- and intraspecific variation but, in actuality, can vary considerably among 
taxa (Collins & Cruickshank 2013). Alternatively, users can avoid the use of arbitrarily employed 
thresholds and retain unique sequences, such as in the form of amplicon sequence variants (ASVs) 
(Callahan et al. 2016) or zero-radius operational taxonomic units (ZOTUs) (Edgar 2016b). The 
generation of ASVs is a denoising technique as erroneous sequences are removed and retained 
sequences are distinguishable by a single nucleotide difference (Eren et al. 2013). Compared to 
clustering of sequences into MOTUs using arbitrary thresholds, the utility of ASVs confers advan- 
tages in allowing finer distinction of sequences, experimental reproducibility, and comparison 
across different studies. It is noteworthy that the generation of MOTUs and ASVs involves dif- 
ferent sequence processing approaches that may influence inferences in biodiversity assessments. 
Nevertheless, both methods have produced largely consistent biological conclusions (Glassman 
& Martiny 2018), with a few studies recommending that ASVs are more suitable for recover- 
ing microbial diversity (Chiarello et al. 2022). Moreover, emerging studies have suggested that 
both sequence processing methods are more complementary than equivalent (Antich et al. 2021b, 
Cholet et al. 2022, Schloss 2021), with the incorporation of both approaches into COI metabarcod- 
ing bioinformatic pipelines highly recommended (Antich et al. 2021b). Lastly, sequence alignment 
and similarity searches with BLAST allow the taxonomic assignment of ASVs or MOTUs through 
the matching of the query sequences to the most similar references on the sequence database 
(Altschul 1990). 

Despite being the staple for ecological analyses of HTS data, bioinformatics can be overwhelm- 
ing for new users, as the required skillsets are distinct from the technical expertise of conven- 
tional molecular ecologists. In such instances, the bioinformatics work can either be commercially 
outsourced or unfamiliar users can consider employing Graphical User Interface (GUI) programs 
(Table 2), which will have broad user applicability since coding proficiency is not required. This 
breaks down the complexity of command-line scripting and paves the way for new users to analyse 
their own DNA metabarcoding data. 


Managing misidentification 


Misidentification during taxonomic work is inevitable, since the accuracy in species identification 
is contingent on a multitude of factors like the taxonomist’s proficiency, quality of taxonomic keys 
(details can be lost in translation rendering them incomplete), and the preservation state of the speci- 
men (Bush et al. 2019). DNA-based methods can potentially be advantageous in scenarios where 
they can reduce taxonomic misidentifications and complement conventional morphological tools 
(DeSalle 2006). In particular, DNA barcoding allows more consistent species identification of taxa 
that have been formally described and previously barcoded with markers possessing adequate spe- 
cies delimitation resolution (Wang et al. 2018). However, DNA-based methods are not without flaws 
and remain largely dependent on reliable morphological identifications, suitable marker choice, 
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and robustness of the reference barcode databases (Hleap et al. 2021). Critically, a specimen that 
is taxonomically misidentified will have an incorrect species name tagged to the DNA barcode in 
global reference databases, and such instances have been routinely uncovered (Liu et al. 2017, Porter 
& Hajibabaei 2018c). This will cause chain misidentifications when following studies search and 
BLAST query sequences against erroneous reference sequences (Leray et al. 2019, Locatelli et al. 
2020). 

We recommend, where available, to also match query sequences to alternative reference databases 
that may be smaller but better curated. For instance, the Barcode of Life (BOLD) can be used in 
conjunction with GenBank for more accurate matching of COI sequences (Ratnasingham & Hebert 
2007). Moreover, BOLD has advanced its ability to automate and organise batch sequence searches 
with BOLDigger (Porter & Hajibabaei 2018a, Buchner & Leese 2020), and the recent emergence of 
curated reference databases like MIDORI2 (Leray et al. 2022) and PR2 (Guillou et al. 2012) can col- 
lectively address the misidentification challenges, thereby increasing the accuracy of DNA-based spe- 
cies identification. Curated reference databases are not limited to COI sequences — marine eukaryotes 
barcoded at other gene loci like the nuclear 18S have the curated PR2 database for sequence matching. 
Lastly, curated databases like MIDORI2 and PR2 have results output formats that are compatible with 
Bayesian or taxonomic classifier programs like CONSTA X2 (Liber et al. 2021), RDP classifier (Wang 
et al. 2007), and SINTAX (Edgar 2016a). Classifier programs provide statistical support for every level 
of taxonomic classification of the query sequences, raising identification confidence and reducing the 
likelihood of barcode misidentification. 


Challenges and prospects of implementation 


Reference databases 


The effectiveness and efficiency of DNA barcoding and metabarcoding are dependent on reli- 
able taxonomic assignment by matching to curated databases with robust representation of spe- 
cies sequences. However, the problem of incomplete databases that have insufficient taxonomic 
resolution for a large portion of published DNA sequence data remains unaddressed (West et al. 
2021, Rourke et al. 2022). Improvements in reference sequence matching and primer resolution are 
urgently needed. The former requires enhancing the completeness of reference databases, since 
most sequences from metabarcoding studies remain unassigned at the species level. For instance, 
COI barcodes with matches at <97% to reference sequences or >97% to database records without 
species epithets are considered ‘unidentified’ and only used as unnamed assemblages or putative 
species for coarse community assessments (Ip et al. 2021b). While single-gene Sanger barcoding 
has helped build sequence databases and should continue to be so, high-throughput technologies for 
the rapid barcoding of specimens (Srivathsan et al. 2019, Chang et al. 2020a, 2022a) or assembly 
of mitogenomes (Quek et al. 2019, 2021, Chang et al. 2022b) may in certain cases exponentially 
expand reference databases (Porter & Hajibabaei 2018b), although DNA barcoding and species 
identification of marine organisms in many geographic areas are currently still lacking (Figure 3A). 
More importantly, complementary morphology-based taxonomic work should be a requirement for 
barcoded specimens to be assigned species names before submission of DNA barcodes to reference 
databases. This measure would reduce the proportion of database entries with inaccurate or impre- 
cise names, and saves the time and effort needed to correct erroneously tagged specimen vouchers 
and barcodes. 

Species identification based on mitochondrial DNA barcoding is feasible for most marine ani- 
mals as several gene loci (12S, 16S, COI) are adequately variable and possess a distinct barcoding 
gap for species delimitation (Meyer & Paulay 2005). However, there are a few marine taxa that 
are either difficult to barcode due to unspecific binding of universal primers or there is a lack of a 
barcoding gap in the commonly targeted barcode regions for identification. For instance, serpulid 
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calcareous tubeworms (Annelida) are challenging to amplify for COI due to poor primer bind- 
ing affinities (Sun et al. 2012), while species delimitation is ineffective with the COI gene for 
Platyhelminthes (Vanhove et al. 2013) and Anthozoa (Huang et al. 2008, McFadden et al. 2011). 
Nevertheless, taxon-specific primers or the use of multimarker approaches targeting different gene 
regions can help distinguish closely related taxa and collectively increase the overall resolution 
of taxon delimitation (Pearman et al. 2018, Ip et al. 2022b), addressing both DNA barcoding and 
metabarcoding issues associated with the lack of barcoding gap, amplification biases, and insuf- 
ficient primer delimitation resolution. Fortunately, improved markers are continually designed for 
barcoding anthozoans in the COI region (McFadden et al. 2011, Nichols & Marko 2019, Shinzato 
et al. 2021), while barcoding alternate mitochondrial loci such as 12S, 16S, mitochondrial protein- 
coding gene (msh1), COI intergenic region (igrl), or nuclear gene regions, such as 18S, 28S, and 
internal transcribed spacers (ITS), have led to improved resolution for delimiting flatworm and 
coral species (Afiq-Rosli et al. 2019, Ip et al. 2022b, Vanhove et al. 2013). However, some of these 
alternative mitochondrial gene loci for most marine phyla remain poorly represented (Figure 3B and 
C), highlighting an urgent need to ramp up efforts for more barcoding work or mitogenomic skim- 
ming and assembly. Furthermore, non-specific amplification can be circumvented with blocking 
primers or bioinformatic filtering to remove non-target DNA and sequencing reads in downstream 
analyses, which would enhance species detection success (Pifiol et al. 2015, Huggins et al. 2020, 
Rabbani et al. 2021). 


Read counts for quantitative estimates 


An intensely debated topic involves the utilisation of metabarcoding sequence data for quantitative 
estimates of species abundance (Kelly 2016), which is highly informative for mapping home ranges 
and distribution patterns (Barnes & Turner 2016), among others. Critically, raw read numbers for 
each species may not be directly proportional to its biomass or abundance (Evans et al. 2016, Kelly 
2016). Studies have been divided in their findings, for instance, Lamb et al. (2019) found a weak rela- 
tionship between eDNA read counts and organism biomass, but Li et al. (2021) reported otherwise 
for large-sized organisms’ eDNA. While some groups have shown the incongruence between the 
amount of sequence reads and absolute abundance of morphological data (Evans et al. 2016, Kelly 
2016), others have reported congruent patterns between molecular (relative abundance) and mor- 
phological derived abundance data for marine plankton, diatoms, fish, and invertebrates (Abad et al. 
2016, Aylagas & Rodriguez-Ezpeleta 2016, Kimmerling et al. 2018, Vasselon et al. 2018, Hoshino 
et al. 2021). Strong positive correlations between read counts and relative species abundances have 
also been demonstrated while tracking ecologically significant events like mass spawning (Bista 
et al. 2018, Tillotson et al. 2018, Rourke et al. 2022, Ip et al. 2022b) and characterising spatiotem- 
poral dispersal patterns of fish larvae (Kimmerling et al. 2018). In particular, Nichols and Marko 
(2019) highlighted the potential for inferring eDNA sequence abundances for reef coral cover esti- 
mates, although West et al. (2022) warned against applying read count data for this purpose, espe- 
cially at more diverse sites. Moreover, Yates et al. (2019) found stronger positive correlations in ex 
situ experiments, but failed to replicate the same trends with in situ experiments. These inconsisten- 
cies in read counts for abundance inferences can be attributed to the use of PCR for metabarcoding, 
which generate biases by impairing the proportional relationship between the quantity of DNA 
pre- and post-amplification (van der Loos & Nijland 2021). Additionally, different metabarcoding 
primers may perform differently with the template DNA in mixed-sample pools (Hajibabaei et al. 
2019), and primers with fewer mismatches with the target loci of template DNA generally yield 
more reliable quantitative results (Pifiol et al. 2019). 

These limitations have motivated the normalisation of read count data to be used as a rough 
index for abundance measures (Lawson Handley et al. 2019, Laporte et al. 2021), allowing for infer- 
ences of relative, rather than absolute, abundances. Accordingly, a multimarker approach using 
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species-specific primers (Beng and Corlett 2020, van der Loos & Nijland 2021) and read count 
normalisation (Laporte et al. 2021) is necessary for reducing potential biases from PCR and primer 
choice. PCR-free and shotgun sequencing can also eliminate PCR biases, and these have also suc- 
cessfully demonstrated positive correlations between read counts and abundance estimation (Bista 
et al. 2018, Ji et al. 2020). These precautionary alternatives appear effective, as more than 90% of 
DNA metabarcoding studies in the past 2 years have demonstrated positive correlations between 
normalised read counts and relative taxon abundances (Rourke et al. 2022). Future studies can 
employ Hellinger transformation of read count data (Laporte et al. 2021) or include internal stan- 
dard DNA for copy number correction (Ushio et al. 2018) to improve the reliability of species 
abundance inferences. 


Environmental RNA for differentiating 
between living and dead organisms 


Environmental DNA have been shown to mostly exist in the environment as 1-10 um sized particles 
(Barnes et al. 2021, Jo et al. 2022a), suggesting that DNA from macroorganisms are likely subcel- 
lular tissue fragments and intracellular DNA from shed cells (Moushomi et al. 2019). As such, cDNA 
is frequently used as a proxy for detecting organism presence, although shedding and decay models 
have shown high variability in eDNA degradation and persistence profiles between different species 
and environment types (Harrison et al. 2019, Zhao et al. 2021). Despite few studies reporting short 
persistence times of less than 8 hours with dead eDNA (Ely et al. 2021), the majority have highlighted 
that one of DNA metabarcoding’s key limitations is its inability to differentiate between the living 
and the dead in environmental samples, resulting in false positive results as the detected organisms 
could have been dead or inactive at the point of sample collection (Jo et al. 2022b, Pochon et al. 2017, 
Marshall et al. 2021). Environmental RNA (eRNA) has been proposed as an alternative to comple- 
ment eDNA-based monitoring programmes (Giroux et al. 2022, Greco et al. 2022, Yates et al. 2021), 
and recent studies have reported eRNA’s ability to distinguish between living and dead (legacy) sig- 
nals (Giroux et al. 2022, Greco et al. 2022, Pochon et al. 2017, Marshall et al. 2021). Conceptually, 
because RNA is physiochemically more unstable and easily degradable compared to DNA, it does 
not remain for long in the environment (Jo et al. 2022b, Kagzi et al. 2022). Therefore, eRNA is 
expected to provide more a reliable ‘time stamp’ for eDNA-detected signals (Jo et al. 2022b, Pochon 
et al. 2017, Kagzi et al. 2022). However, this may not always be the case as Wood et al. (2020) found 
that eDNA and eRNA shared similar half-life profiles. Depending on the target organism’s physi- 
ological state, there could also be prolonged eRNA detection due to upregulation of selected genes in 
response to environmental stimuli (Cristescu 2019, Jo et al. 2022b, Ikert et al. 2021). Nevertheless, the 
majority of the studies that have compared eRNA and eDNA performances have presented evidence 
that eRNA results are more precise in recognising only signals from living organisms (Giroux et al. 
2022, Greco et al. 2022, López-Escardó et al. 2018, Miyata et al. 2021). 

Despite being a viable alternative to eDNA metabarcoding, the laborious nature and logisti- 
cal challenges of eRNA experiments (extra steps in preservation and wet-lab processing) obstruct 
its widespread applicability in research and routine monitoring programmes. Due to the unstable 
nature of RNA, proper preservation of bulk and environmental samples for RNA work is more 
demanding, including field requirements, such as flash freezing with liquid nitrogen, or specialised 
and costly preservatives like RNAlater (Invitrogen) to maintain RNA integrity between sample 
collection and storage (Passow et al. 2019). If these logistical challenges can be addressed, the 
concurrent use of eRNA and eDNA tools will surely advance the feasibility of genomic tools for 
environmental monitoring (Von Ammon et al. 2019, Greco et al. 2022, Marshall et al. 2021). Most 
importantly, eRNA is highly relevant for studies that need to determine the viability of detected sig- 
nals, such as for invasive species management and endangered species monitoring (Sepulveda et al. 
2020, Farrell et al. 2021, Veilleux et al. 2021). 
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Advancing field-based molecular applications 


Despite the performance advantages of HTS, the molecular processes are nonetheless restricted to 
lab-based settings, while data generation and bioinformatic analyses remain limited by sequenc- 
ing run times and computational power (Table 1). Recently, portable- and non-sequencing-related 
innovations have enabled in situ field molecular experiments, accelerating the rate of biodiversity 
assessments (Figure 2) (Baerwald et al. 2020, But et al. 2020, Chang et al. 2020b, Doi et al. 2021). 
Field-based applications eliminate the need for storage and transport to the laboratory as process- 
ing work can be done in situ, thereby increasing efficiency and output through time savings (Chang 
et al. 2020b, Doi et al. 2021) and extending the range (distance and time) of sampling expeditions 
(Baerwald et al. 2020). Clearly, field-ready equipment and reagents are needed, such as BentoLab, 
miniPCR™, and Genie® for molecular processes like DNA extraction and PCR. MinION sequencer, 
two3™, and Genie® portable qPCR devices, commercial colorimetric and lateral flow kits are 
required for field sequencing analyses and visualising data output (Baerwald et al. 2020, But et al. 
2020, Doi et al. 2021). Unless using equipment that includes a mini-centrifuge, e.g., BentoLab, 
reagents should preferably require minimal centrifugation in the field, such as the QuickExtract™ 
(Lucigen) for DNA extractions, Recombinase Polymerase Amplification (RPA, TwistDx) reagents 
for isothermal DNA extraction/amplification, and AMPure XP (Beckman Coulter) magnetic beads 
for clean-up procedures (Chang et al. 2020b). While the MinION sequencer, Genie®, and two3™ 
qPCR thermocyclers allow real-time analyses of high-throughput sequence and species detection 
data without reliance on a laboratory (Doi et al. 2021, Pomerantz et al. 2022), we note that these are 
sophisticated equipment. It may also be challenging to visualise data or explore bioinformatics in 
depth in the field. 

User-friendly applications without the need for electrical power, portable thermocyclers, or 
specialised equipment are increasingly available and will be highly advantageous for research in 
developing countries or remote survey sites where lab-based experiments are not cost-effective 
or logistically feasible. In particular, loop-mediated isothermal amplification (LAMP) assays with 
colorimetric kits (But et al. 2020, Porco et al. 2022) and clustered regularly interspaced short pal- 
indromic repeats technology’s (CRISPR) specific high-sensitivity enzymatic reporter unlocking 
(SHERLOCK) method with lateral flow devices (LFA) (Baerwald et al. 2020) are available for 
easy data visualisation, although these are currently appropriate only for species-specific detection 
rather than whole-community analyses. SHERLOCK-LFA is a remarkably sensitive detection tool 
that has been highly successful as a non-invasive molecular tool for field-based species identifica- 
tion (Baerwald et al. 2020), but its potential for targeted species detection in bulk or environmental 
DNA metabarcoding studies has not yet been explored. Nevertheless, given the recent successful 
optimisations of eDNA detection of important food fish species with CRISPR from water samples 
(Williams et al. 2019, 2021), we would also expect SHERLOCK-LFA to replicate such application 
successes in environmental monitoring in the near future. These ongoing developments are exciting 
prospects for field molecular techniques to further enhance biomonitoring efficiency, biodiversity 
assessments, species discovery, documentation, and diversity estimation (Creer et al. 2016, Deiner 
et al. 2017, Hering et al. 2018, West et al. 2021). 


Research gaps and future priorities 


The immediate priorities for enhancing the ecological applications of Biomonitoring 2.0 are to 
extend the utility of the molecular tools for robust species monitoring, continue testing the feasibil- 
ity and innovating more reliable solutions for using sequencing reads to estimate abundances of 
target organisms. Recent developments in the field have seen expansions in the scope and range 
of species monitoring. This is evident with the increasing number of studies administering the 
Biomonitoring 2.0 framework and conducting molecular work outside of lab-based settings in the 
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field (Chang et al. 2020b, Pomerantz et al. 2022). Molecular processes have also been streamlined to 
be more user-friendly, so that amateur and citizen scientists can also contribute to species monitor- 
ing efforts (Larson et al. 2020, Miya et al. 2022). 

Although species abundance is crucial for numerous ecological applications, the inference 
of relative abundances from sequence read counts remains a challenge (Pawlowski et al. 2018). 
Currently, there is no straightforward remedy for this sequencing quantification and abundance 
estimate issue. We acknowledge these limitations and recognise that it is technically impractical to 
infer absolute abundance, and only relative abundance estimates may be credible from standardised 
and quality-controlled sequencing read count data. As such, the most reliable utility of sequencing 
data for ecological applications remains at determining species presence or absence, characterising 
community composition, determining multispecies site occupancy, inferring rough relative abun- 
dances, as well as evaluating food webs and species co-occurrences (Kang et al. 2022, Lawson 
Handley et al. 2019, Valdivia-Carrillo et al. 2021). Future work interested in using sequencing read 
counts for relative abundance measures has to consider correction factors in their experimental 
designs for both the wet- and dry-lab components. To improve the accuracy of relative abundance 
estimates, internal positive controls (Ushio et al. 2018), application of quantitative sequencing tech- 
niques (Hoshino et al. 2021), and normalisation or transformation of read count data during bioin- 
formatic analyses need to be considered and tested (Lawson Handley et al. 2019, Laporte et al. 2021, 
Rourke et al. 2022). 


Conclusions and outlook 


Recent advancements in molecular laboratory processes, DNA sequencing technologies, and bio- 
informatics have all engendered integrative genomic approaches that can be leveraged for species 
discovery to keep pace with and even outpace the rate of defaunation and extinction (Dirzo et al. 
2014). DNA barcoding and metabarcoding have been increasingly applied in the last decade to 
establish biodiversity baselines (Figure 1), which can be used to flag environmental changes and 
allow prompt implementation of management measures (Figure 2). Consequently, these approaches 
have proved highly effective in expediting species discovery and identification, demonstrating capa- 
bilities in the rediscovery of locally missing taxa, flagging potential new records, and drawing focus 
on neglected marine species diversity. Besides updating biodiversity estimates, optimised genomic 
approaches have immense potential for investigating food web interactions and studying organism 
responses to climate change and other anthropogenic impacts. 

The growing number of threatened species worldwide (Davidson & Dulvy 2017) calls for cost- 
effective and reliable methods for mapping species distributions and diversity to inform manage- 
ment decisions. Fortunately, a decade’s worth of progress with Biomonitoring 2.0 has demonstrated 
numerous innovative applications for effective species monitoring of marine habitats. The number 
of DNA barcoding and metabarcoding applications continues to rise (Figure 1), and with the limi- 
tations continually being addressed, we foresee that genomic tools for large-scale environmental 
monitoring will continue to be optimised, streamlined, and eventually mainstreamed. The founda- 
tional role of database assembly with DNA barcodes of properly identified, vouchered specimens 
for biodiversity studies cannot be neglected (Chang et al. 2020a). Therefore, more work is needed to 
expand and upgrade reference databases for robust sequence matching (Porter & Hajibabaei 2018b, 
Figure 3), which in turn will help improve interpretation of molecular data and draw more meaning- 
ful ecological conclusions. 

The apparent trends in reduced sequencing costs and rise in sequence data yield will drive 
expansion and diversification of ecological applications (Grant et al. 2021), requiring novel study 
approaches. As such, the experimental challenges are seeing a shift from sequence through- 
put to bioinformatics analyses and drawing biological inferences from the increasingly complex 
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molecular data output. Biomonitoring 2.0 has created a novel form of ‘vouchers’, where raw 
sequence data and nucleic acid extracts from environmental samples can be considered as ‘com- 
munity-level vouchers’ that should be viewed as analogous to the museum-archived specimen 
vouchers. These materials are frozen in time as they store a historical snapshot of the commu- 
nity, which can be revisited for establishing baselines and time series in future studies. While 
the sequencing technological improvements have motivated advances in computational biology 
for effective data analyses, other molecular techniques have also been developed to simplify 
analytical processes while accomplishing similar research or surveillance goals, including quan- 
titative and digital droplet PCR (qPCR and ddPCR, respectively) (Schweiss et al. 2020, Kwong 
et al. 2021, Yip et al. 2021), LAMP (But et al. 2020, Porco et al. 2022), and SHERLOCK-LFA 
(Baerwald et al. 2020). 

Biomonitoring 2.0 has demonstrated broad applicability for a wide range of environmental 
research. Importantly, it has enhanced the scale and scope of genomic research for numerous 
ecological applications. Capitalising on the current technological trajectory, which is centred on 
upscaling data throughput and automation, and in lockstep with the rapid progress in genomic 
approaches, reference database expansion, and ecological modelling, Biomonitoring 2.0 will con- 
tinue to push the field towards new frontiers (Cordier et al. 2019, Compson et al. 2020, Havermans 
et al. 2022). 
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